Search CORE

59 research outputs found

Tail index estimation, concentration and adaptivity

Author: Boucheron Stéphane
Thomas Maud
Publication venue
Publication date: 01/01/2015
Field of study

This paper presents an adaptive version of the Hill estimator based on Lespki's model selection method. This simple data-driven index selection method is shown to satisfy an oracle inequality and is checked to achieve the lower bound recently derived by Carpentier and Kim. In order to establish the oracle inequality, we derive non-asymptotic variance bounds and concentration inequalities for Hill estimators. These concentration inequalities are derived from Talagrand's concentration inequality for smooth functions of independent exponentially distributed random variables combined with three tools of Extreme Value Theory: the quantile transform, Karamata's representation of slowly varying functions, and R\'enyi's characterisation of the order statistics of exponential samples. The performance of this computationally and conceptually simple method is illustrated using Monte-Carlo simulations

arXiv.org e-Print Archive

Crossref

Hal-Diderot

About adaptive coding on countable alphabets

Author: Bontemps Dominique
Boucheron Stéphane
Gassiat Elisabeth
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

This paper sheds light on universal coding with respect to classes of memoryless sources over a countable alphabet defined by an envelope function with finite and non-decreasing hazard rate. We prove that the auto-censuring AC code introduced by Bontemps (2011) is adaptive with respect to the collection of such classes. The analysis builds on the tight characterization of universal redundancy rate in terms of metric entropy % of small source classes by Opper and Haussler (1997) and on a careful analysis of the performance of the AC-coding algorithm. The latter relies on non-asymptotic bounds for maxima of samples from discrete distributions with finite and non-decreasing hazard rate

arXiv.org e-Print Archive

CiteSeerX

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

Hal-Diderot

Model selection and error estimation

Author: Gábor Lugosi
Peter L. Bartlett
Stéphane Boucheron
Publication venue
Publication date
Field of study

We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error estimate may be converted into a data-based penalty function and the performance of the estimate is governed by the quality of the error estimate. We consider several penalty functions, involving error estimates on independent test data, empirical {\sc vc} dimension, empirical {\sc vc} entropy, and margin-based quantities. We also consider the maximal difference between the error on the first half of the training data and the second half, and the expected maximal discrepancy, a closely related capacity estimate that can be calculated by Monte Carlo integration. Maximal discrepancy penalty functions are appealing for pattern classification problems, since their computation is equivalent to empirical risk minimization over the training data with some labels flipped.Complexity regularization, model selection, error estimation, concentration of measure

Research Papers in Economics

Sharp threshold for percolation on expanders

Author: Benjamini Itai
Boucheron Stéphane
Lugosi Gábor
Rossignol Raphaël
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2012
Field of study

We study the appearance of the giant component in random subgraphs of a given large finite graph G=(V,E) in which each edge is present independently with probability p. We show that if G is an expander with vertices of bounded degree, then for any c in ]0,1[, the property that the random subgraph contains a giant component of size c|V| has a sharp threshold.Comment: Published in at http://dx.doi.org/10.1214/10-AOP610 the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Hal-Diderot

Coding on countably infinite alphabets

Author: Boucheron Stéphane
Garivier Aurélien
Gassiat Elisabeth
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

33 pagesInternational audienceThis paper describes universal lossless coding strategies for compressing sources on countably infinite alphabets. Classes of memoryless sources defined by an envelope condition on the marginal distribution provide benchmarks for coding techniques originating from the theory of universal coding over finite alphabets. We prove general upper-bounds on minimax regret and lower-bounds on minimax redundancy for such source classes. The general upper bounds emphasize the role of the Normalized Maximum Likelihood codes with respect to minimax regret in the infinite alphabet context. Lower bounds are derived by tailoring sharp bounds on the redundancy of Krichevsky-Trofimov coders for sources over finite alphabets. Up to logarithmic (resp. constant) factors the bounds are matching for source classes defined by algebraically declining (resp. exponentially vanishing) envelopes. Effective and (almost) adaptive coding techniques are described for the collection of source classes defined by algebraically vanishing envelopes. Those results extend ourknowledge concerning universal coding to contexts where the key tools from parametric inferenc

Hal-Diderot

A sharp concentration inequality with applications

Author: Gábor Lugosi
Pascal Massart
Stéphane Boucheron
Publication venue
Publication date
Field of study

We present a new general concentration-of-measure inequality and illustrate its power by applications in random combinatorics. The results find direct applications in some problems of learning theory.Concentration of measure, Vapnik-Chervonenkis dimension, logarithmic Sobolev inequalities, longest monotone subsequence, model selection

Research Papers in Economics

Concentration inequalities in the infinite urn scheme for occupancy counts and the missing mass, with applications

Author: Ben-Hamou Anna
Boucheron Stéphane
Ohannessian Mesrob I.
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 09/01/2015
Field of study

An infinite urn scheme is defined by a probability mass function

(p_j)_{j\geq1}

over positive integers. A random allocation consists of a sample of

N

independent drawings according to this probability distribution where

N

may be deterministic or Poisson-distributed. This paper is concerned with occupancy counts, that is with the number of symbols with

r

or at least

r

occurrences in the sample, and with the missing mass that is the total probability of all symbols that do not occur in the sample. Without any further assumption on the sampling distribution, these random quantities are shown to satisfy Bernstein-type concentration inequalities. The variance factors in these concentration inequalities are shown to be tight if the sampling distribution satisfies a regular variation property. This regular variation property reads as follows. Let the number of symbols with probability larger than

x

\vec{\nu}(x)=|\{j:p_j\geq x\}|

. In a regularly varying urn scheme,

\vec{\nu}

satisfies

\lim_{\tau \rightarrow0}\vec{\nu}(\tau x)/\vec{\nu}(\tau)=x^{-\alpha}

for

\alpha\in[0,1]

and the variance of the number of distinct symbols in a sample tends to infinity as the sample size tends to infinity. Among other applications, these concentration inequalities allow us to derive tight confidence intervals for the Good--Turing estimator of the missing mass.Comment: Published at http://dx.doi.org/10.3150/15-BEJ743 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Codage à protections inégales et diffusion

Author: BOUCHERON Stéphane
SALAMATIAN Mohammad-Reza
Publication venue: GRETSI, Groupe d’Etudes du Traitement du Signal et des Images
Publication date: 01/01/1997
Field of study

Les signaux audio-visuels transmis sur les réseaux informatiques sont le résultat d'un codage-source qui hiérarchise les différentes parties du message encodé. Dans un environnement comme INTERNET, il est important de concevoir des codages « canal » prenant en compte les particularités du codage source et conférant au flot de données une résistance aux principales défaillances des réseaux: les pertes de paquets. Nous déterminons les limites informationnelles du codage à protection inégale contre les effacements. En réinterprétant une inégalité récente sur les codes à protection inégale, nous donnons d'abord une caractérisation simple des débits accessibles sur un canal de diffusion à effacements avec messages dégradés. Nous montrons ensuite l'optimalité du multiplexage par juxtaposition et entrelacement. Puis nous discutons le problème de la dégradation gracieuse sur un canal de diffusion, nous montrons que lorsque la fonction de débit-distorsion dépend logarithmiquement de la distorsion, on peut garantir l'existence d'un schéma de raffinements successifs approximatif

I-Revues

« Gouverner, prier et combattre avec les Pères en Italie (Ve-XVIIIe siècle). Pour une histoire politique de la mémoire d’Ambroise »

Author: Boucheron Patrick
Gioanni Stéphane
Publication venue: HAL CCSD
Publication date: 01/01/2015
Field of study

International audienc